{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Early stopping of Gradient Boosting\n\n\nGradient boosting is an ensembling technique where several weak learners\n(regression trees) are combined to yield a powerful single model, in an\niterative fashion.\n\nEarly stopping support in Gradient Boosting enables us to find the least number\nof iterations which is sufficient to build a model that generalizes well to\nunseen data.\n\nThe concept of early stopping is simple. We specify a ``validation_fraction``\nwhich denotes the fraction of the whole dataset that will be kept aside from\ntraining to assess the validation loss of the model. The gradient boosting\nmodel is trained using the training set and evaluated using the validation set.\nWhen each additional stage of regression tree is added, the validation set is\nused to score the model.  This is continued until the scores of the model in\nthe last ``n_iter_no_change`` stages do not improve by atleast `tol`. After\nthat the model is considered to have converged and further addition of stages\nis \"stopped early\".\n\nThe number of stages of the final model is available at the attribute\n``n_estimators_``.\n\nThis example illustrates how the early stopping can used in the\n:class:`sklearn.ensemble.GradientBoostingClassifier` model to achieve\nalmost the same accuracy as compared to a model built without early stopping\nusing many fewer estimators. This can significantly reduce training time,\nmemory usage and prediction latency.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Authors: Vighnesh Birodkar <vighneshbirodkar@nyu.edu>\n#          Raghav RV <rvraghav93@gmail.com>\n# License: BSD 3 clause\n\nimport time\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn import ensemble\nfrom sklearn import datasets\nfrom sklearn.model_selection import train_test_split\n\nprint(__doc__)\n\ndata_list = [datasets.load_iris(), datasets.load_digits()]\ndata_list = [(d.data, d.target) for d in data_list]\ndata_list += [datasets.make_hastie_10_2()]\nnames = ['Iris Data', 'Digits Data', 'Hastie Data']\n\nn_gb = []\nscore_gb = []\ntime_gb = []\nn_gbes = []\nscore_gbes = []\ntime_gbes = []\n\nn_estimators = 500\n\nfor X, y in data_list:\n    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,\n                                                        random_state=0)\n\n    # We specify that if the scores don't improve by atleast 0.01 for the last\n    # 10 stages, stop fitting additional stages\n    gbes = ensemble.GradientBoostingClassifier(n_estimators=n_estimators,\n                                               validation_fraction=0.2,\n                                               n_iter_no_change=5, tol=0.01,\n                                               random_state=0)\n    gb = ensemble.GradientBoostingClassifier(n_estimators=n_estimators,\n                                             random_state=0)\n    start = time.time()\n    gb.fit(X_train, y_train)\n    time_gb.append(time.time() - start)\n\n    start = time.time()\n    gbes.fit(X_train, y_train)\n    time_gbes.append(time.time() - start)\n\n    score_gb.append(gb.score(X_test, y_test))\n    score_gbes.append(gbes.score(X_test, y_test))\n\n    n_gb.append(gb.n_estimators_)\n    n_gbes.append(gbes.n_estimators_)\n\nbar_width = 0.2\nn = len(data_list)\nindex = np.arange(0, n * bar_width, bar_width) * 2.5\nindex = index[0:n]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Compare scores with and without early stopping\n----------------------------------------------\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.figure(figsize=(9, 5))\n\nbar1 = plt.bar(index, score_gb, bar_width, label='Without early stopping',\n               color='crimson')\nbar2 = plt.bar(index + bar_width, score_gbes, bar_width,\n               label='With early stopping', color='coral')\n\nplt.xticks(index + bar_width, names)\nplt.yticks(np.arange(0, 1.3, 0.1))\n\n\ndef autolabel(rects, n_estimators):\n    \"\"\"\n    Attach a text label above each bar displaying n_estimators of each model\n    \"\"\"\n    for i, rect in enumerate(rects):\n        plt.text(rect.get_x() + rect.get_width() / 2.,\n                 1.05 * rect.get_height(), 'n_est=%d' % n_estimators[i],\n                 ha='center', va='bottom')\n\n\nautolabel(bar1, n_gb)\nautolabel(bar2, n_gbes)\n\nplt.ylim([0, 1.3])\nplt.legend(loc='best')\nplt.grid(True)\n\nplt.xlabel('Datasets')\nplt.ylabel('Test score')\n\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Compare fit times with and without early stopping\n-------------------------------------------------\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.figure(figsize=(9, 5))\n\nbar1 = plt.bar(index, time_gb, bar_width, label='Without early stopping',\n               color='crimson')\nbar2 = plt.bar(index + bar_width, time_gbes, bar_width,\n               label='With early stopping', color='coral')\n\nmax_y = np.amax(np.maximum(time_gb, time_gbes))\n\nplt.xticks(index + bar_width, names)\nplt.yticks(np.linspace(0, 1.3 * max_y, 13))\n\nautolabel(bar1, n_gb)\nautolabel(bar2, n_gbes)\n\nplt.ylim([0, 1.3 * max_y])\nplt.legend(loc='best')\nplt.grid(True)\n\nplt.xlabel('Datasets')\nplt.ylabel('Fit Time')\n\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}